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SUMMARY 

This paper presents a procedure for computing the aeroelasticity of wings 
on parallel multiple-instruction, multiple-data (MIMD) computers. In this proce- 
dure, fluids are modeled using Euler equations, and structures axe modeled using 
modal or finite element equations. The procedure is designed in such a way that 
each discipline can be developed and maintained independently by using a domain 
decomposition approach. In the present parallel procedure, each computational 
domain is scalable. A parallel integration scheme is used to compute aeroelastic re- 
sponses by solving fluid and structural equations concurrently. The computational 
efficiency issues of parallel integration of both fluid and structural equations are 
investigated in detail. This approach, which reduces the total computational time 
by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using 
various numbers of processors on the Intel iPSC/860. 

INTRODUCTION 

In recent years, significant advances have been made for single-discipline use 
of both computational fluid dynamics (CFD), using finite-difference approaches, and 
computational structural dynamics (CSD), using finite-element methods. In single 
disciplines, computations have been made on complete aircraft. However, only a 
limited amount of work has been completed in coupling these two disciplines for 
multidisciplinary applications. The prime reason is the lack of computational power 
for combining the two major computational fields. The development of a new gen- 
eration of parallel computers can possibly alleviate the restriction of computational 
power. 

A multidisciplinary code for computing unsteady flows and aeroelastic re- 
sponses of aerospace vehicles, ENSAERO, has been developed on serial supercom- 
puters at the Computational Aerosciences Branch of the NASA Ames Research 
Center (ref. 1). This multidisciplinary code computes unsteady aerodynamic re- 
sponses of aircraft using the Euler/Navier-Stokes equations. The modal or finite 
element equations are used to model structures. An aeroelastic shape-conforming 
moving grid is used to include the effect of structural deformations on unsteady 
flows. This code is designed in a modular fashion to adopt several different nu- 
merical schemes suitable for accurate aeroelastic computations. The basic coding 
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of ENSAERO can accommodate zonal grid techniques for efficient modeling of full 
aircraft. 

An early version of ENSAERO (ref. 2) has been successfully applied in com- 
puting aeroelastic responses of a rectangular wing by using the Euler equations 
for fluids and the modal equations for structures. The result demonstrates that 
the code can accurately predict the flutter dynamic pressure of a rectangular wing. 
The code was extended to compute aeroelastic responses using the Navier-Stokes 
equations for fluids (ref. 3). Later, it was updated by utilizing an upwind algorithm, 
and the code has been applied to fighter wings undergoing unsteady motions (refs. 4 
and 5) at moderately large angles of attack. This code also has a capability of mod- 
eling moving control surfaces (ref. 6). Furthermore, ENSAERO has demonstrated 
the capability to simulate transonic flows on wing-body configurations using the 
Navier-Stokes equations (ref. 7). 

For simple geometries such as clean wings, the modal approach can produce 
accurate response results. However, the modal approach may be less accurate for 
complex structures such as wing-body configurations. In order to accurately rep- 
resent aeroelastic responses of general wing-body configurations, the finite element 
equations for structures have been implemented in ENSAERO. A typical wing- 
body configuration has been used to demonstrate aeroelastic responses at transonic 
Mach numbers using the Navier-Stokes equations for fluids and the finite element 
equations for structures (ref. 8). Two-noded beam elements are used to model the 
wing-body structures. Each node has three degrees of freedom (DOF) corresponding 
to transverse displacement and to transverse and torsional rotations, respectively. 

In the past, all computations were accomplished serially, on computers such 
as the Cray Y-MP at Ames Research Center. Currently, a version of ENSAERO 
(ref. 9) that uses the Euler equations for fluids and the modal equations for struc- 
tures has been parallelized on the Intel iPSC/860 at Ames. The Intel iPSC/860 is 
a distributed- memory, multiple-instruction, multiple-data (MIMD) computer with 
128 processors. In this parallel implementation, a domain decomposition approach 
is used in which the fluid equations and the structural equations are modeled in sep- 
arate computational domains. Each domain is mapped individually onto a group of 
processors, referred to as a cube on the Intel iPSC/860. As a result, each discipline 
can be developed and implemented independently of the others. However, because 
of the coupling between the disciplines, there is a need to exchange data, such as 
pressures and structural deformations at interfaces. This exchange between the 
fluid and structural domains is accomplished through an intercube communication 
mechanism (ref. 10), which enables different processors in each cube to communicate 
directly. 

In this work, procedures to compute aeroelastic responses on MIMD parallel 
computers using direct coupling of the Euler equations for fluids and the modal or 
finite element equations for structures are investigated for wings. The implementa- 
tion of the structural domain on the iPSC/860 is described in detail. In addition, the 
computational efficiency issues of parallel integration of both fluid and structural 
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equations are investigated in detail. The proposed integration scheme exploits the 
architecture of MIMD computers. This research will provide an efficient procedure 
for aeroelastic analysis on MIMD computers. 

This work was completed using the resources of the Numerical Aerodynamic 
Simulation (NAS) Program at NASA Ames Research Center. The work by C. Byun 
was supported by NASA Ames Research Center under Cooperative Agreement 
Number NCC2-740. 

GOVERNING AERODYNAMIC EQUATIONS 

The strong conservation- law form of the Euler equations is used for shock- 
capturing purposes. The Euler equations in generalized coordinates can be written 
as (ref. 11) 

d r Q + d$E + d v F + d(G = 0 (1) 

where Q, E, F, and G are flux vectors in generalized coordinates. The following 
transformations are used in deriving equation (1): 

T = t 

£-£{x,y,z,t) 

rj = r](x, y, z, t) 

C = C {x,y,z,t) 

To solve equation (1), ENSAERO has time-accurate methods based on both 
central-difference and upwind schemes (ref. 12). In this work, the central-difference 
scheme based on the implicit approximate factorization algorithm of Beam and 
Warming (ref. 13) with modifications by Pulliam and Chaussee (ref. 14) for diago- 
nalization was used. This scheme is first order accurate in time. 

AEROELASTIC EQUATIONS OF MOTION 

The governing aeroelastic equations of motion for structures can be written 
as 

[A/]® + [<?]{?} + [*]{?} = {Z} (3) 

where [ M ), [C], and [K] are the global mass, damping, and stiffness matrices, 
respectively, and where {Z} is the aerodynamic force vector corresponding to the 
displacement vector {<?}. These quantities can be expressed in modal coordinates 
or finite element coordinates depending on the method used to obtain structural 
dynamic responses. One of the main efforts is concerned with the computation of 
the global force vector {Z} of equation (3). In this work, for a given time t, {Z} 
is computed by solving the Euler equations. From the solution of equation (1), 
pressure coefficients are computed at all grid points on the wing surface. Using 
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these pressure coefficients, the force vector { Z } is calculated by means of the modal 
matrix or a fluid-structural interface, which is described in the next section. 

The aeroelastic equations of motion (3) have been solved in past work (refs. 2 
and 8) by a numerical integration technique based on the linear acceleration method 
(ref. 15). This method has been successfully used for integrating the modal equa- 
tions, assuming a linear variation of the acceleration. However, this integration 
method requires a very small time step in order to integrate the finite element 
equations of motion. As a result, for the finite element equations of motion, the 
constant-average-acceleration method (ref. 16) is adopted to increase the time-step 
size. This method is an extension of the linear acceleration method. Assuming a 
constant average acceleration on a time interval, the velocities and displacements 
are obtained at a time t as 


{?}t — {?}t-At + 
{?}t = {<?}t-At + 


~2 + 
At {<7}t-At + 



{«}« 

({?}t + {?}(— At) 


(4) 


Using these equations, the displacements at the end of a time interval can be ob- 
tained by solving 

PK<?}t = { Z} t + [M]{a} + [C}{v] (5) 

where 

4 4 

{ a } = ^2 {?}t-At + ^{<7}t-At + {<7'}t-At 
2 

{ v } = ^{?}t-At + {?}t— At 

This is an unconditionally stable scheme, whereas the linear acceleration method is 
conditionally stable. 


FLUID-STRUCTURAL INTERFACES 


In aeroelastic analysis, it is necessary to represent the equivalent aerody- 
namic loads at the structural nodal points and to represent the deformed structural 
configurations at the aerodynamic grid points. In the present domain decomposition 
approach, coupling between the fluid and structural domains is achieved by inter- 
facing the boundary data, such as aerodynamic pressures and structural deflections, 
at each time step. An analytical moving-grid technique has been successfully used 
to deform the aerodynamic grid according to the structural deflections at the end 
of every time step (refs. 1, 2, and 8) There are different approaches for obtaining 
the global force vector {Zj of equation (3), depending on the equations used for 
the structural dynamic analysis. 
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For the modal equations of motion, the global force vector can be easily 
obtained in terms of the preselected mode shapes (modal matrix) as 

{Z} = (6) 

where [$] is the modal matrix and [A] the diagonal area matrix of the aerodynamic 
control points. The unsteady differential pressure coefficients on the wing surface 
are defined as {A C p ). It is noted that the modal matrix is also used to represent 
the deformation of the wing. 

Solution of the equations of motion based on the finite element discretization 
requires a fluid-structural interface similar to the modal matrix. Several numerical 
procedures have been developed for exchanging the necessary information between 
the fluid and structural domains (refs. 17-19). However, in this study, a linear 
interpolation scheme is first developed for the interface so that coupling of fluid 
and structural equations could be simple for implementation on the new parallel 
computers. This scheme is called the lumped load interface. In this method, the 
force acting on each element of the structural mesh is first calculated and then 
the element nodal force vector is obtained by distributing the force. The global 
force vector is obtained by assembling the nodal force vectors of each element. In 
addition, the deformed configuration of the CFD grid at the surface is obtained by 
linearly interpolating nodal displacements at finite element nodes. 

Next, a mapping matrix developed by Appa (ref. 18) is selected to accurately 
exchange data between the fluid and structural interface boundaries. The reason 
for selecting Appa’s method is that the mapping matrix is general enough to accom- 
modate changes in fluid and structural models easily. In addition, this approach 
conserves the work done by aerodynamic forces when obtaining the global nodal 
force vector. This method introduces a virtual surface between the CFD surface 
grid and the finite element mesh for the wing. The virtual surface is discretized by 
a number of finite elements, which are not necessarily the same elements used in 
the structural surface modeling. This method is called the virtual surface interface. 

By forcing the deformed virtual surface to pass through the given data points 
of the deformed structure, a mapping matrix relating displacements at structural 
and aerodynamic grid points is derived as 

[T] = [V>a] (<5 _1 [^] + [fps] T [^s]) 1 bPs) T ( 7 ) 

where 

[K] is the free-free stiffness of the virtual surface 
xp 9 is the displacement mapping from virtual to structural grids 
ip a is the displacement mapping from virtual to aerodynamic grids 
6 is the penalty parameter 


5 


Then, the displacement vector at the aerodynamic grid {g 0 } can be expressed in 
terms of the displacement vector at the structural nodal points q 3 as 

{?.} = m {«.} 

Prom the principle of virtual work, the nodal force vector { Z a } can be obtained as 

{ 2 = m T {Z4 

where { Z a } is the force vector at the aerodynamic grids (ref. 20). 

PARALLEL IMPLEMENTATION OF THE 
AEROELASTIC EQUATIONS 

The domain decomposition approach used in this study enables data struc- 
tures and solution methods for fluid and structural equations to be developed inde- 
pendently. Fluid and structural equations are modeled in separate computational 
domains. However, coupling of the disciplines requires the exchange of the interface 
boundary data, which is accomplished through an intercube communication mech- 
anism (ref. 10). This intercube communication facility enables different processors 
in each cube to communicate directly on the iPSC/860. It is important to keep the 
specification of data exchange routine the same on both computational domains. 

The domain for fluids is capable of solving the Euler equations using 3-D 
uni-partitioning of the computational domain. The uni-partitioning scheme denotes 
that one grid subdomain is assigned to each of the processors. The arrangement 
of processors is described in figure 1. The arrows denote bi-directional data com- 
munication. There are a variety of concurrent algorithms available for solving the 
system of equations for fluids. Currently, the solver for the fluid equations can 
use three different concurrent algorithms: (1) complete exchange- based implemen- 
tations (CE-GE), (2) pipelined-Gaussian elimination (PGE), and (3) substructured 
Gaussian elimination followed by solution of the reduced system by means of bal- 
anced odd-even cyclic reduction (SGE-BCR) (ref. 21). In this work, the one-way 
PGE scheme is used. The choice of algorithms was made largely on the basis of 
memory use. The one-way PGE method allows the use of larger computational 
grids, or of fewer processors, than do the other schemes. More details about the 
implementation of the fluid domain can be found in reference 21. 

For the structural domain, modal equations were first used on the iPSC/860. 
Since a limited number of preselected mode shapes were used, only a single processor 
was assigned to the structural domain. However, in replacing modal equations with 
the finite element equations, it is necessary to use a cube of multiple processors for 
structures. In this study, it is assumed that each subdomain of the entire structure 
is mapped onto a single processor. Each processor stores only the information 
relevant to the subdomain assigned to it. The information can be the stiffness and 
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mass matrices and the applied nodal force vector of a subdomain. Then equation (5) 
is expressed at the subdomain level. 

In this work, a regular finite element mesh is used to model wings as a plate, 
and the domain decomposition is made by using 2-D uni-partitioning as shown 
in figure 1. This type of domain decomposition enables an efficient and simple 
message communication mechanism within the structural domain. Only chordwise 
and spanwise bi-directional messages are exchanged along the subdomain interfaces 
in the structural domain. 

At present, the solver for the structural domain is based on a Jacobi- 
preconditioned conjugate gradient (JPCG) algorithm on the Intel iPSC/860. The 
present JPCG algorithm is based on a parallel conjugate gradient algorithm pro- 
posed by Law (ref. 22). The algorithm is described in the appendix for completeness. 
The advantage of Law’s algorithm is that it does not form the global system matri- 
ces. In this method, the multiplication of a matrix by a trial vector, which is the 
major operation of the conjugate gradient algorithm, is performed at the subdomain 
level. The interprocessor communication is confined to the solution phase. 

INTEGRATION SCHEMES FOR COUPLED DOMAINS 

In a serial computer, the integrations of both fluid and structural equations 
are performed one after the other in a sequential nature. Figure 2(a) shows the 
sequential integration scheme implemented on MIMD computers. In the sequential 
integration scheme, the fluid domain has to wait to proceed to the next time step 
until it receives information about structural deformations. The structural domain 
also has to wait for surface pressure data, so both cubes have their own idle times 
while they wait for data communications. However, since the size of the structural 
equations was small for modal analysis, the effect of the waiting time was negligible 
relative to the computational time per integration step. 

On the contrary, if a large number of modes or direct finite element equations 
are used in order to accurately predict dynamic responses of complex structures, 
the computational time per integration step may be increased rapidly. This is due 
to the increase in the idle time when a sequential integration scheme is used on 
the iPSC/860. However, this situation can be avoided by executing the integration 
of both fluid and structural equations concurrently as shown in figure 2(b). In the 
proposed parallel integration scheme, both solvers start computations independently 
and one of the solvers waits until the other finishes its calculation. Then they 
exchange the required data with each other for the next time step. By doing so, the 
parallel integration can reduce the idle time since only one cube may have partial 
idle time. The resulting speedup achieved by the parallel integration scheme is 
theoretically by a factor of almost 2, provided that computational times required 
for the" fluid and structural domains are well balanced. 
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COMPUTATIONAL RESULTS 


To demonstrate an aeroelastic computation, a clipped delta wing of aspect 
ratio 3 and taper ratio 1/7 with the NACA 65A006 airfoil section was selected. 
The sweep angle at the quarter chord line (yl c / 4 ) is 45°. The transonic flutter 
characteristics of this wing are available from wind tunnel tests (ref. 23) for various 
flow parameters. 

In this computation, the flow field is discretized by using a C-H grid topology 
of size 151 x 30 x 25. The CFD grid at the root and the upper surface of the wing 
is shown in figure 3. The CFD grid is assigned to 32 processors on the iPSC/860. 
The processors are arranged as a three-dimensional mesh of eight processors in the 
chordwise direction and two processors in each of the spanwise and surface- normal 
directions. This same arrangement for fluids is kept throughout the computations 
so that the performance of the structural domain can be studied in detail. 

A 20-DOF ANS4 shell element (ref. 24) was used for the finite element mod- 
eling of the wing structure. The wing is modeled as a plate. Considering the wing 
structure used in the experiment, variation of mass density is allowed along the 
chordwise and spanwise directions. But the thickness of the finite element model 
is kept constant. This is based on assumptions that the stiffness of the wing is 
dominated by the aluminum-alloy insert and that mass distribution of the wing is 
significantly changed as a result of plastic foams covering the aluminum- alloy insert. 
This finite element plate model predicts natural vibration modes of the wing that 
compare well with the experiment. The first three modal frequencies computed by 
using the finite element model are 21.8, 78.1, and 126 Hz, and corresponding values 
measured in the experiment are 21.6, 79.7, and 121 Hz, respectively. 

Modal Analysis 

The first parallel version of ENSAERO was capable of using the Euler equa- 
tions for fluids and the modal equations for structures. Using this version of 
ENSAERO, aeroelastic responses were computed. In figure 4, the computed gen- 
eralized displacements of the first three modes for the wing are presented. The 
results were obtained for a freestream Mach number (M^) of 0.854 and a given 
dynamic pressure (P) of 0.7 psi. In this calculation, the first six mode shapes of 
the wing were used to predict the structural dynamic responses. Identical results 
were obtained from serial and parallel computers. At this point, it is verified that 
the fluid and structural domains of the parallel ENSAERO and of the intercube 
communication mechanism are properly working. It should be noted that only one 
processor is assigned for the structural analysis since only six mode shapes are used 
to represent the structural properties of the wing. The wall-clock times required 
for each integration step on a single processor of the Cray Y-MP and on the Intel 
iPSC/860 are 1.36 and 3.03 seconds, respectively. 
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The proposed parallel integration scheme is compared with the sequential 
integration scheme used on serial computers. Using both sequential and paral- 
lel integration schemes, aeroelastic responses of the wing were computed on the 
iPSC/860. Aeroelastic responses were computed by simulating experimental con- 
ditions for a freestream Mach number of 0.977 and for a given dynamic pressure 
of 0.65 psi. Results from sequential and parallel integration schemes agree well, as 
shown in figure 5. Since only six modes were used for the modal analysis, the reduc- 
tion in computational time was marginal (less than 2% of the time per integration 
step used in the sequential integration scheme). By increasing the number of modes 
to 50, the sequential integration scheme required 5% more computational time per 
integration step whereas the time for the parallel integration scheme remained the 
same. This trend is more evident as the number of equations increases. 

Finite Element Analysis 

For simple geometries such as rectangular wings, the modal analysis can 
accuratly predict dynamic responses. However, the modal analysis with a limited 
number of preselected mode shapes may be less accurate for complex structures 
such as wing-body configurations. In order to accurately represent aeroelastic re- 
sponses of general aircraft configurations, the modal analysis was replaced with the 
direct finite element analysis in ENSAERO. The finite element equations were first 
tested on the Cray Y-MP version of the code and then were parallelized on the 
Intel iPSC/860. 

The lumped load and virtual surface interfaces on the Y-MP are shown in 
figures 6 and 7. The wing structure was modeled using 100 ANS4 elements. Ten 
elements each were assigned along the chordwise and spanwise directions, respec- 
tively. The time history of total lift on the wing for a given dynamic pressure of 
1.0 psi and initial acceleration of 1.0 x 10 5 inches/sec 2 , is presented in figure 6. 
The exact solution is the total lift obtained by integrating pressure coefficients at 
CFD grid points. Both virtual surface (VS) and lumped load (LL) interfaces obtain 
the total lift by summing the forces at the finite element (FE) nodal points, which 
were transformed from pressure coefficients through interfaces. The virtual surface 
interface transfers pressure data more accurately than the lumped load interface. 
The lumped load interface shows a favorable result although the response around 
peaks deviates from the exact solution. In addition, the tip displacements of the 
wing at the leading edge are presented in figure 7. The lumped load approach shows 
favorable agreement with the virtual surface approach. 

The lumped load approach was first used as the fluid-structural interface for 
the finite element equations on the iPSC/860. The choice of interfaces was made 
largely on the basis of memory use. The size of the mapping matrix for the virtual 
surface interface becomes too large to fit on a single processor on the iPSC/860 when 
the number of fluid grid points or structural nodal points on the wing increases. 
However, the lumped load approach requires only a small amount of memory to 
identify the location of fluid grid points on a finite element discretization. 
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In order to check the parallel implementation of the finite element equations, 
the tip displacements of the wing were computed, without aerodynamic forces, on 
the Y-MP and the iPSC/860. This wing was modeled using 64 ANS4 elements. For 
this computation, processors were assigned as a 2-D mesh of two processors in the 
chordwise and spanwise directions, respectively, on the iPSC/860. Identical results 
were obtained on both the Y-MP and the iPSC/860. 

For the structural model using finite element equations, the aeroelastic re- 
sponses were obtained using both sequential and parallel integration schemes on 
the iPSC/860. The results are presented in figure 8. The responses were ob- 
tained for a given dynamic pressure of 1.0 psi. The same finite element mesh 
and processor arrangement used in the previous case are used for this computation. 
The two results agree well. The wall-clock times per integration step achieved are 
3.45 and 3.00 seconds by using sequential and parallel integration schemes, respec- 
tively. The speedup is still marginal since the total number of equations is relatively 
small (360 DOF) for the structural dynamic analysis. However, for 256 finite el- 
ements (1360 DOF) with four processors on the structural domain, the wall-clock 
times per integration step are 6.22 and 3.33 seconds by using sequential and par- 
allel integration schemes, respectively. The speedup achieved is 1.87 by using the 
parallel integration scheme. When the computational time between the fluid and 
structural domains is balanced, maximum speedup can be achieved. 

The parallel integration scheme enables the combination of advanced CFD 
and CSD technologies with minimal increase in the computational time per integra- 
tion step. The required computational time per integration step is determined by 
both the fluid and structural domains on serial computers. However, using the par- 
allel integration scheme on MIMD computers, the time is solely determined by the 
computational domain that requires more time per integration step. This parallel 
integration is one of the advantages of using MIMD computers for multidisciplinary 
analysis. 

Aeroelastic responses were also computed for various dynamic pressures in 
order to predict the flutter dynamic pressure. Figure 9 shows the stable, near 
neutrally stable, and unstable responses of wing tip displacements at the leading 
edge for dynamic pressures of 0.80, 0.85, and 0.90 psi, respectively. From the 
responses shown in figure 9, the interpolated dynamic pressure for the neutrally 
stable condition is 0.84 psi. It is noted that the experimental dynamic pressure 
measured at the neutrally stable condition was 0.91 psi. Considering the lack of 
experimental pressure data on the wing and the error involved in modeling the 
wing as a plate with constant thickness, the computational result is an acceptable 
prediction of the flutter dynamic pressure. 

Performance 

In order to support multidisciplinary analysis with practical computational 
turnaround times for design work, the computational domain on parallel machines 
must be scalable. This section describes several aspects of performance, including 
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single processor computational rates, domain decomposition strategy, and scalabil- 
ity of the structural domain in ENSAERO on the iPSC/860. It is noted that only 
a linear model is used for the dynamic analysis of the structural domain. The per- 
formance of the fluid domain can be found in reference 21. All performance data 
reported are for 64-bit arithmetic. 

In order to measure the performance of the structural domain on the Intel 
iPSC/860, the identical code, except message passing routines, was run on the 
Y-MP and the averaged time per integration step was obtained. All FLOP rates 
quoted are calculated by comparing the time per integration step on the iPSC/860 
with that on the Y-MP using a single processor. Operation counts from the Cray 
Hardware Performance Monitor are used. 

A single processor of the iPSC/860 was able to house 256 ANS4 elements. 
The Y-MP equivalent MFLOPS obtained is about 4.2 MFLOPS for this size of 
problem, and the corresponding rate is about 77 MFLOPS on a single Y-MP pro- 
cessor. This rate is about 7% of the peak performance of a single processor on the 
iPSC/860. Similar performance was reported by Ryan and Weeratunga for the fluid 
domain (ref. 21). 

The performance of the structural domain in parallel ENSAERO has been 
measured over a wide range of processor numbers and problem sizes as shown in 
figure 10. The speedup relative to the Y-MP is defined as 


speedup = 


tCray 
t Intel 


where t C ray and t lnte i are the computational time per integration step measured 
on the Y-MP and the iPSC/860, respectively. Only a single processor is used 
to measure t Cr ay on the Y-MP. The open and filled symbols denote the domain 
decomposition which results in the minimum and maximum bandwidths of the 
stiffness matrix of each subdomain for a given number of processors. 

For the case of 1,360 DOF, the computational time per integration step on 
the iPSC /860 is barely closed to that on the Y-MP when 64 processors are in use. 
However, as increasing the size of problem (10,560 and 20,800 DOF), the iPSC/860 
achieves about the speed of the Y-MP by using 16 processors. It is evident that 
the JPCG solver on the iPSC/860 performs better as the size of problem increases. 
For the case of 20,800 DOF, the relative speedup achieved is about 8 by the time 
64 processors are in use. 

For a given number of processor, the obtained speedup varies depending on 
the domain decomposition strategy as shown in figure 10. Only the results for the 
minimum and maximum bandwidths are presented for clarity. The speedup in- 
creases as decreasing the matrix bandwidth of each subdomain for a given number 
of processors on the iPSC/860. This is due to the fact that the conjugate gradient 
algorithm is subjected to the multiplication of the coefficient matrix and a trial vec- 
tor. Since the multiplication is performed only at the subdomain level in the JPCG 
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solver, the smaller bandwidth results in fewer operations and quicker computational 
time. 

The overall performance of ENSAERO on both the Y-MP and the iPSC/860 
is shown in figure 11 for the case of 113,250 grid points for the fluids and 10,560 DOF 
for the structure. In this computation, 32 processors are assigned to the fluid domain 
and 16 to 64 processors to the structural domain. Both the skyline reduction and 
JPCG solvers are compared on the Y-MP but only the JPCG solver is used for the 
structural domain on the iPSC/860 at the present time. A parallel version of the 
skyline reduction solver is under implementation. 

The height of each column is the time per integration step. Each column is 
divided into zones representing the time spent for the fluid domain, the structural 
domain, and for idle/intercube communication. It should be noted that the time per 
integration step for the skyline reduction solver included only time spent for forward 
reduction and back substitution without the factorization time. The reason is that 
the contribution of the factorization time to the computation time per integration 
step is negligible when a large number of time steps are required to obtain linear 
dynamic responses. Providing that 7,000 steps are required for a typical aeroelastic 
computation of the given wing, the increase of the time per integration step is about 
0.5% of the time used for structure. 

It is evident that the skyline reduction solver outperforms the JPCG solver 
on the Y-MP. However, the JPCG solver is first implemented on the iPSC/860. The 
reason is that it is desirable to compare the performance of the two solvers on the 
iPSC/860. The JPCG solver on the iPSC/860 could not achieve the performance of 
the skyline reduction solver without including the factorization time on the Y-MP. 
However, as far as the JPCG solver is concerned, 16 processors on the iPSC/860 can 
obtain the performance of a single processor on the Y-MP. In addition, the overall 
performance of ENSAERO using 96 processors on the iPSC/860 is about one-third 
of that obtained using the skyline reduction solver with a single processor on the 
Y-MP. This result is based on the averaged time per integration step. It should be 
noted that the structural domain determined the time per integration step for this 
particular problem on the iPSC/860. Most of the time on the fluid domain was spent 
waiting for the interface boundary data. This means that fewer processors can be 
assigned to the fluid domain without sacrificing computational performance as long 
as the memory on each processor can accommodate the assigned grid partitioning. 

CONCLUSIONS 

A parallel version of a multidisciplinary code, ENSAERO, was developed 
on the Intel iPSC/860. A domain decomposition approach was used to enable 
the fluid and structural domains to be developed and maintained independently. 
This approach provides an efficient and effective environment to researchers. A 
researcher concerned with the fluid or the structural domain can develop his own 
discipline independent of the others. The only thing to be done together is coupling 
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of the disciplines. Since coupling of the disciplines is achieved by an intercube 
communication mechanism, coupling should not cause any problem as long as each 
domain maintains the specification for intercube communication. This makes it easy 
for each discipline to incorporate and develop new algorithms or data structures 
without interferences. 

In addition to the modal analysis, the capability of finite element analysis 
for structural dynamic analysis has been added to parallel ENSAERO. The struc- 
tural domain on the iPSC/860 is again divided into a number of subdomains. The 
solution of the structural domain is obtained by the Jacobi preconditioned conju- 
gate gradient (JPCG) solver. The partition of the structural domain for a wing 
is two-dimensional uni-partitioning since the wing is modeled as a plate and the 
discretization is a regular mesh. This enables the message communication within 
the structural domain to be very simple and efficient. 

As far as the structural domain is concerned, the performance on the 
iPSC/860 is still far behind the best performance of the Y-MP a result of the 
poor performance of the JPCG algorithm. A parallel version of the skyline reduc- 
tion solver is being implemented for the structural domain on the iPSC/860. This 
will provide increased performance for the structural domain on the iPSC/860. For 
a problem size of 10,560 DOF, the overall performance of parallel ENSAERO using 
96 processors on the iPSC/860 is about one-third of that obtained using the skyline 
reduction solver for the structural domain on a single Y-MP processor. 

The parallel integration scheme enables the combination of advanced CFD 
and CSD technologies with minimal increase in computational time per integration 
step. The computational time per integration step is solely determined by the do- 
main that requires more computational time on the iPSC/860 whereas that time it 
is determined by both domains on serial computers. This parallel integration is one 
of the advantages of using MIMD computers for multidisciplinary analysis. The pro- 
cedure developed in this research will provide an efficient tool for solving aeroelastic 
problems of complete aerospace vehicle configurations on MIMD computers. 
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APPENDIX 


1. Set diagonal elements of global stiffness matrix 

Send {Kj j} to neighboring processors 
Receive {Kjj} from neighboring processors 
{<*'} = E {Kjj} + (K^} 

2. Set initial trial and residual vectors 

{?'} = 0 
M = { , '} 

3. Set initial search direction 

Send {r e } to neighboring processors 
Receive {r e } from neighboring processors 
{s e } = £{r 1 } + {r e } 
z ] = s j/dj, j = l,...,neq e 
p e = {r e } T {z e } 

7 o = £ p e , e = 1, ...,np (global sum) 

1 = lo 

{P e } = M 

4. Operations at subdomain level 

{«*"} = IK‘}{P‘) 

P‘ = { p'} t K } 

a’ - 0‘/y 

5. Update solution and residual 

l/c* = £<7 e , e = 1, np (global sum) 

{q e } = {q e } +<*{p e } 

{r e } = {r e } — c*{u e } 

6. Update search direction and check convergence 

Send {r e } to neighboring processors 
Receive {r e } from neighboring processors 
{s e } = £{r 1 } 4- {r e } 
z* = j = l,-..,neq e 

p e = {r e } T {z e } 

Inew = £ P e , e = 1, ...,np (global sum) 
IF( inew/lo < TOLERANCE) STOP 

{p e } ={«*}+ (inew/ 1 ) {p e } 

1 = Inew 

7. Repeat 4 to 6 until converged 
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3-D uni-partitioning 
(Fluid domain) 


2-D uni-partitioning 
(Structural domain) 


Figure 1. Processor arrangements and message exchanges through interprocessor 
and intercube communications. 
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(a) Sequential integration 



(b) Parallel Integration 


Figure 2. Flow diagrams for sequential and parallel integration schemes. 
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Figure 3. The CFD grid at the root and surface of a clipped delta wing with the 
NACA 65A006 airfoil section. (Aspect ratio = 3.0, Taper ratio = 1/7, A c/ a = 45°) 
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Figure 4. Comparison among the first three generalized displacement histories ob- 
tained by the serial and parallel ENSAERO codes. (M^ = 0.854, P = 0.70 psi) 
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Figure 5. Generalized displacement histories obtained from sequential and parallel 
integration schemes. (Moo = 0.977, P = 0.65 psi) 
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Figure 6. Comparison of total lift responses. 
(Moo = 0.854, a = 0 deg., P = 1.0 psi) 



Figure 7. Comparison of wing tip displace- 
ments at the leading edge obtained by us- 
ing virtual surface and lumped load inter- 
faces. (Moo = 0.854, a = 0 deg., 
P = 1.0 psi) 



Figure 8. Aeroelastic responses obtained 
by using sequential and parallel integra- 
tion schemes. (Moo = 0.854, a = 0 deg., 
P = 1.0 psi) 


Figure 9. Aeroelastic responses of a clipped 
delta wing at three different dynamic pres- 
sures using Euler equations for the fluid 
and finite element equations for the struc- 
tural domains. (Moo = 0.854, a = 0 deg.) 
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Number of processors 


Figure 10. Computational performance of the structural domain in ENSAERO with 
various problem sizes and domain decompositions on the Intel iPSC/860. 



(Y-MP) 


Number of processors (IPSC/860) 


Figure 11. Overall computational performance of ENSAERO on the Cray Y-MP 
and the Intel iPSC/860. 
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